Quantile Encoder: Tackling High Cardinality Categorical Features in Regression Problems
نویسندگان
چکیده
Regression problems have been widely studied in machine learning literature resulting a plethora of regression models and performance measures. However, there are few techniques specially dedicated to solve the problem how incorporate categorical features problems. Usually, feature encoders general enough cover both classification This lack specificity results underperforming models. In this paper, we provide an in-depth analysis tackle high cardinality with quantile. Our proposal outperforms state-of-the-art encoders, including traditional statistical mean target encoder, when considering Mean Absolute Error, especially presence long-tailed or skewed distributions. Besides, deal possible overfitting categories small support, our encoder benefits from additive smoothing. Finally, describe expand encoded values by creating set different quantiles. expanded provides more informative output about question, further boosting model.
منابع مشابه
Local Polynomial Quantile Regression With Parametric Features
We propose a new approach to conditional quantile function estimation that combines both parametric and nonparametric techniques. At each design point, a global, possibly incorrect, pilot parametric model is locally adjusted through a kernel smoothing fit. The resulting quantile regression estimator behaves like a parametric estimator when the latter is correct and converges to the nonparametri...
متن کاملHigh-Dimensional Structured Quantile Regression
Quantile regression aims at modeling the conditional median and quantiles of a response variable given certain predictor variables. In this work we consider the problem of linear quantile regression in high dimensions where the number of predictor variables is much higher than the number of samples available for parameter estimation. We assume the true parameter to have some structure character...
متن کاملMoment-Based Quantile Sketchesfor Efficient High Cardinality Aggregation Queries
Interactive analytics increasingly involves querying for quantiles over specific sub-populations and time windows of high cardinality datasets. Data processing engines such as Druid and Spark use mergeable summaries to estimate quantiles on these large datasets, but summary merge times are a bottleneck during high-cardinality aggregation. We show how a compact and efficiently mergeable quantile...
متن کاملQuantile-based categorical statistics
Traditional point-to-point verification is more and more superseded by situation-based verification such as an object-oriented mode. One main reason is that difficulties are encountered while interpreting the outcome of a conventional contingency table based on amplitude thresholds. Firstly, a predetermined amplitude threshold splits the distributions under comparison at an unknown location. In...
متن کاملEXTREMAL QUANTILE REGRESSION 3 quantile regression
Quantile regression is an important tool for estimation of conditional quantiles of a response Y given a vector of covariates X. It can be used to measure the effect of covariates not only in the center of a distribution, but also in the upper and lower tails. This paper develops a theory of quantile regression in the tails. Specifically , it obtains the large sample properties of extremal (ext...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-85529-1_14